Estimating the impact of environmental management on strawberry yield using publicly available agricultural data in South Korea

Advanced information and communication technologies (ICTs) have made data collection more efficient for agricultural studies. Using publicly available database in South Korea, we estimated the relationship between the management of air temperature and relative humidity and the strawberry yield during two harvest seasons. Longitudinal data of multiple greenhouses were merged and processed, and mixed-effects models were applied to account both observed and unobserved factors across the greenhouses. The averages of air temperature and relative humidity inside each greenhouse do not take volatility of the time-varying variables into consideration, so we assessed the management of each greenhouse by the percent of time that air temperature between 15 °C and 20 °C (denoted as T%) and the percent of time that relative humidity between 0% and 50% (denoted by H%). The statistical models estimated that the strawberry yield decreases with respect to the number of days since harvest began and the rate of decrease is slower when T% and H% are higher. This study used large-scale multilocation data to provide the practical suggestion that air temperature and relative humidity should be maintained within the optimal ranges to mitigate the loss of strawberry yield especially at the later phase of a harvest season.


INTRODUCTION
Advanced ICTs have generated big data in various fields. Big data usually means a large amount of structured or unstructured information that is complex and difficult to process using existing databases and management processes (Kim & Lee, 2020;Manyika et al., 2011). Agricultural researchers can benefit from big data generated during the cultivation process, and it contains growth variables, environmental conditions, and yields. To utilize accurately quantify the relationship between the management of air temperature and relative humidity and the production of strawberry.
It is very hard for researchers to monitor and record strawberry data from multiple locations across the country. Therefore, the publicly available strawberry data provided in South Korea is a valuable resource for research purposes. Upon our literature review, most studies have used the averages of environmental variables to understand greenhouse conditions. The indoor greenhouse conditions are highly sensitive to outdoor conditions which vary throughout the day. The averages do not take the volatility of time-varying conditions into consideration. Rather than the averages, the percent of time that air temperature and relative humidity are in an optimal range better reflects the greenhouse management. In addition, strawberry yield tends to decrease throughout a harvest season. In this study, the large-scale longitudinal data are analyzed using mixed-effects model to accurately estimate the impact of maintaining air temperature and relative humidity in an optimal range on the trend of yield loss with respect to harvest time (the number of days since harvest began).

Data collection
For a pilot study, an open API dataset was used which was available by a South Korean public database portal (Ministry of Interior & Safety of South Korea, 2021). It included 13 farms during the 2017-18 season. For the primary study, a public dataset was used which was available by the South Korean government (Ministry of Interior & Safety of South Korea, 2022). It included 78 farms across nine provinces in South Korea during the 2020-21 season. 'Seolhyang', 'Keumsil', and 'Jukhyang' cultivars were included in the data, and 'Seolhyang' was the most commonly cultivated cultivar in the greenhouses. Air and soil temperatures, relative humidity, solar radiation, soil water content, precipitation, CO 2 concentration, EC, pH, and the number of fruits produced were collected from each farm. Figure 1 provides the map of South Korea and indicates the strawberry farms included in the pilot and primary studies. Table 1 provides information of the strawberry farms and  environmental variables for the primary study, and Table 2 provides the information for the pilot study. Some variables were not recorded for all time periods across the season, but air temperature and relative humidity were recorded for most of the time periods. For feasible and specific aims of this study, we focused on describing and estimating the relationship between the air temperature and relative humidity and the fruit yield from February to June of 2021. We curated and analyzed data on the air temperature and relative humidity inside greenhouses and the number of fruit yields per plant during the 5month period. The same variables were available in the pilot data, so we were able to implement the same statistical models to analyze both pilot data (13 farms in the 2017-18 season) and primary data (78 farms in the 2020-21 season).

Statistical analysis
For statistical modeling, the outcome (dependent) variable of interest was the number of fruits per plant, and the explanatory (independent) variables were air temperature ( C), relative humidity (%), and month. The month is an important variable because the strawberry plant tends to produce less fruit, on average, with respect to harvest time. Hereafter, the month is denoted by M = 2, 3, 4, 5, and 6 for February, March, April, May, and June, respectively. In addition to these observable factors (air temperature, relative humidity, and month), there might be unobserved factors which vary across multiple farms (e.g., greenhouse management skill, equipment). Each farm was repeatedly observed in the dataset, therefore a mixed-effects model was suitable to account for both fixed effects (air temperature, relative humidity, and month) and random effects (farms). In the datasets, the air temperature and relative humidity were recorded hourly. Using the hourly information, the percent of time that the air temperature was between 15 C and 20 C was calculated. Hereafter, this variable is denoted by T % , and it was used in the mixed-effect model. Similarly, the percent of time that the relative humidity was between 0% and 50%. Hereafter, this variable is denoted by H % , and it was used in the model. Note that T % and H % were chosen, instead of the average air temperature and relative humidity, to assess the management of greenhouse environments. High values of T % and H % imply low volatilities around the optimal ranges which cannot be captured by the averages. For instance, if the air temperature was always 10 C during the night and was always 30 C during the day, the daily average would be 20 C, but T % = 0.
The number of fruits per plant was transformed using the natural logarithm to respect the normal error assumption, and its average is denoted by µ hereafter. Two mixed-effects models were used to explain µ as a function of T % , H % , and M (fixed effects) and farms (random effects). For the first model, denoted by Model 1, the average was specified as µ = β 0 + β 1 M + β 2 T % + β 3 H % . Under this model, the null hypothesis was H 0 : β 2 = β 3 = 0, and the alternatively hypothesis was H 1 : β 2 > 0 and β 3 > 0. In other words, we tested   (5) whether a higher number of fruits is expected when T % and H % are higher at a given month. For the second model, denoted by Model 2, the average was specified as µ = β 0 + β 1 M + β 2 T % + β 3 H % + β 4 (M × T % ) + β 5 (M × H % ). Under this model, the null hypothesis was H 0 : β 4 = β 5 = 0, and the alternative hypothesis was H 1 : β 4 > 0 and β 5 > 0. In other words, we tested whether the decreasing expected yield over harvest time is mitigated when values of T % and H % are high. The number of fruits per plant was also recorded weekly for the most of the evaluation period. For both Model 1 and Model 2, we considered both weekly and monthly average number to observe whether the statistical inference would be sensitive to the choice between the weekly average and the monthly average. For modeling the weekly average, zeroes were removed from the analysis, and M (month) was replaced by W (week). Both Model 1 and Model 2 were fitted to both 2017-18 dataset (13 farms) and 2020-21 dataset (78 farms). Figure 2 shows that the expected number of fruits per plant decreased over time (monthly) using the 2020-21 dataset. Each farm has its own management practices and experiences, and the magnitude of negative slopes (i.e., decreasing yield) varied across the farms. Figure 3 shows the management of air temperature and of relative humidity across the farms. On average, the farms maintained the air temperature between 15 C and 20 C for about 20% of the time and the relative humidity between 0% and 50% for about 17% of the time during the observation period (February to June). The figure also shows that the management of air temperature and of relative humidity vary across the farms. Table 3 quantifies the monthly relationship between the expected yield and T % and H % via the regression parameters estimated by the mixed-effects model. The left columns of Table 3 are for the 2020-21 data, and the right columns are for the 2017-18 data. Focusing on the 2020-21 data (78 farms), Model 1 showed that T % and H % are not related to the monthly expected yield (p = 0.66 and 0.94, respectively), but it showed that the monthly  expected yield decreased with respect to month (p = 0.0001). Model 2, however, revealed that both T % and H % are related to the slope of yield with respect to month (p = 0.03 and 0.0022, respectively). Both beta4 and beta5 were estimated positively, and it is interpreted that higher T % and H % mitigated the magnitude of the negative slope of yield with respect to month. The similar trend was observed in the pilot 2017-18 data (13 farms). The statistical significance for the interaction between T % and month was stronger in the 2017-18 data (p < 0.0001) than in the 2020-21 data (p = 0.03), and the interaction between H % and month was weaker in the 2017-18 data (p = 0.23) than in the 2020-21 data (p = 0.0022). We note that Model 1 and Model 2 have different objectives. Model 1 explains the expected strawberry yield using T % , H % , and M assuming the slope of the yield with respect to M is constant given T % and H % , whereas Model 2 assumes that the slope of the yield with respect to M depends on T % and H % . The results of 2020-21 data indicate that, when compared to Model 1, Model 2 better explains the impact of the management of air temperature and relative humidity on the yield. In other words, while the expected yield decreases as the plant continues to produce the fruit, high values of T % and H % are important for mitigating the decreasing yield with respect to month. Figure 4 shows the expected yield (the log-transformed average number of fruits per plant) at T % = 10 and 30 and H % = 10, 20, and 30 using the regression parameters estimated by the 2020-21 data. According to the model estimates, the expected yield would reduce substantially over time when T % = 10 and H % = 10 (the left panel of Fig. 4), whereas it would be well maintained for the 5-month period when T % = 30 and H % = 30 (the right panel of Fig. 4). For instance, over the 5-month period, the median number of fruits per plant reduced by 50% when T % = 10 and H % = 10, and the reduction was only 9% when T % = 30 and H % = 30. Finally, Table 4 summarizes the model estimates for the weekly average number of fruits, and the similar patterns were observed when compared to the monthly average number of fruits for the two seasons (Table 3).

DISCUSSION
Past studies have shown that environmental variables controlled in greenhouses during strawberry cultivation can affect both growth and fruit yield Sim et al., 2020). Sim et al. (2020) predicted strawberry growth and fruit yield by air and soil temperatures, relative humidity, soil moisture content, EC, photosynthetic active radiation, and vapor pressure deficit. These environmental variables predicted the growth and fruit yield with high correlation coefficients. However, it is not easy to collect all environmental variables from all commercial farms due to high cost and low need. Some farms may be unable to install certain sensors due to structural issues. The challenges of missing variables were observed in the dataset (Tables 1 and 2). Therefore, the most widely used and influential environmental variables were selected for this study. Herein, daily air temperature and relative humidity were considered as the main environmental factors for the following four reasons. First, the effect of air temperature and relative humidity on the number of fruit yield has been extensively evaluated in a variety of fruits and vegetables including strawberry (Demirsoy et al., 2007;Ledesma & Sugiyama, 2005), tomato (Abdalla & Verkerk, 1968;Harel et al., 2014), and sweet pepper (Bakker, 1989;Khah & Passam, 1992). Second, as aforementioned, the two factors are easily observable in all farms, hence the statistical models are applicable to most farms (Tables 1 and 2). Third, daily air temperature and relative humidity reflect other key environmental factors due to their high correlations. Ahn et al. (2021) and Jo et al. (2021) showed that the trend in air temperature was similar to that of soil temperature and photosynthetic active radiation, and the trend in relative humidity was related to vapor pressure deficit. Lastly, in our previous study, the correlation coefficients between daily air temperature and relative humidity and strawberry fruit yield were 0.82 and −0.93, respectively (Sim et al., 2020). Most studies have considered the averages of air temperature and relative humidity to explain an outcome variable. However, the number of fruit yield was not significantly related to the averages of air temperature and relative humidity in this study. Instead, we used the statistical models after considering the following. First, we chose an optimal air temperature range-between 15 C and 20 C-and evaluated the management of air temperature by estimating the percent of time that air temperature was within the range for each farm. The observed air temperature ranged between 5 C and 40 C across the farms, and the average of air temperature might not be an accurate measure of the management of air temperature. We considered that good management entails reducing the volatility around an ideal air temperature, if such an ideal point exists. Strawberry growth is optimal at the air temperature of 23-28 C during daytime and 5-10 C during nighttime (Takei, 2010). On the other hand, a low air temperature changes strawberry fruit size and color and potentially damages strawberry fruit, and a high air temperature potentially reduces photosynthetic rate, fruit yield, and sugar content of the fruit (Ariza et al., 2012;Wang & Camp, 2000). In this study, the air temperature of 15-20 C is chosen based on suggestions in literature (Bish, Cantliffe & Chandler, 2002;Kadir, Sidhu & Al-Khatib, 2006). Second, the winter in South Korea is dry, but the relative humidity in greenhouses is very high (90% or above). Hence, farmers should reduce the internal relative humidity to facilitate transpiration and prevent diseases. As shown in Fig. 2, many farms observed in this study struggled with maintaining a low relative humidity inside greenhouses. Even though a suggested range of relative humidity is 60-80% (Choi, Chung & Suh, 1997), we evaluated the management of relative humidity at 50% or below. This is because relative humidity decreases rapidly when ventilation is applied and it rises immediately after the ventilation. In this regard, the percent of time that relative humidity is 50% or below (H % ) reflects the level of effort to lower the relative humidity during a high-humidity season in South Korea. Therefore, the results should be generalized with care. It does not necessarily imply that the driest conditions are optimal, as fogging could be beneficial in a dry environment (Morgan, 2006). Third, though all farms have different expected fruit yield for various reasons, the gradually decreasing trend of fruit yield throughout a harvest season is a natural phenomenon observed in most farms (Fig. 2). In this regard, it was reasonable to hypothesize that the downtrend can be mitigated by managing the indoor air temperature and relative humidity, and we used the statistical models to explain the "slope" of fruit yield with respect to harvest time (the number of days since harvest began). Furthermore, the mixed-effects model accounts unobserved variables such as leaf mass per unit leaf area and light intensity (Bertin & Gary, 1998;Chatterton, Lee & Hungerford, 1972;Reddy et al., 1989), CO 2 concentration, and air temperature (Bertin & Gary, 1998;Acock, Charles-Edwards & Sawyer, 1979;Charles-Edwards, 1979;Leadley & Reynolds, 1989). Resultantly, it is more reasonable to conclude that the impact of air temperature and relative humidity is especially significant at the later phase of a harvest season rather than concluding that the impact is constant throughout the harvest season (Tables 3 and 4).

CONCLUSIONS
Based on publicly available data collected from a large number of farms across South Korea, we evaluated the management of air temperature (15-20 C) and relative humidity (50% or below) at each greenhouse. Using the percent of time within the optimal ranges, we conclude that the impact of indoor air temperature and relative humidity is especially significant at the later phase of a harvest season. Therefore, strawberry farmers are needed to continually manage air temperature and relative humidity through ventilation to reduce the loss of yield at the end of a harvest season. If a harvest season is in winter like in South Korea, active management such as heating and forced ventilation may be required.